97 research outputs found

    D-TrAttUnet: Dual-Decoder Transformer-Based Attention Unet Architecture for Binary and Multi-classes Covid-19 Infection Segmentation

    Full text link
    In the last three years, the world has been facing a global crisis caused by Covid-19 pandemic. Medical imaging has been playing a crucial role in the fighting against this disease and saving the human lives. Indeed, CT-scans has proved their efficiency in diagnosing, detecting, and following-up the Covid-19 infection. In this paper, we propose a new Transformer-CNN based approach for Covid-19 infection segmentation from the CT slices. The proposed D-TrAttUnet architecture has an Encoder-Decoder structure, where compound Transformer-CNN encoder and Dual-Decoders are proposed. The Transformer-CNN encoder is built using Transformer layers, UpResBlocks, ResBlocks and max-pooling layers. The Dual-Decoder consists of two identical CNN decoders with attention gates. The two decoders are used to segment the infection and the lung regions simultaneously and the losses of the two tasks are joined. The proposed D-TrAttUnet architecture is evaluated for both Binary and Multi-classes Covid-19 infection segmentation. The experimental results prove the efficiency of the proposed approach to deal with the complexity of Covid-19 segmentation task from limited data. Furthermore, D-TrAttUnet architecture outperforms three baseline CNN segmentation architectures (Unet, AttUnet and Unet++) and three state-of-the-art architectures (AnamNet, SCOATNet and CopleNet), in both Binary and Mutli-classes segmentation tasks

    A study on different experimental configurations for age, race, and gender estimation problems

    Get PDF
    This paper presents a detailed study about different algorithmic configurations for estimating soft biometric traits. In particular, a recently introduced common framework is the starting point of the study: it includes an initial facial detection, the subsequent facial traits description, the data reduction step, and the final classification step. The algorithmic configurations are featured by different descriptors and different strategies to build the training dataset and to scale the data in input to the classifier. Experimental proofs have been carried out on both publicly available datasets and image sequences specifically acquired in order to evaluate the performance even under real-world conditions, i.e., in the presence of scaling and rotation

    CNN based facial aesthetics analysis through dynamic robust losses and ensemble regression

    Get PDF
    In recent years, estimating beauty of faces has attracted growing interest in the fields of computer vision and machine learning. This is due to the emergence of face beauty datasets (such as SCUT-FBP, SCUT-FBP5500 and KDEF-PT) and the prevalence of deep learning methods in many tasks. The goal of this work is to leverage the advances in Deep Learning architectures to provide stable and accurate face beauty estimation from static face images. To this end, our proposed approach has three main contributions. To deal with the complicated high-level features associated with the FBP problem by using more than one pre-trained Convolutional Neural Network (CNN) model, we propose an architecture with two backbones (2B-IncRex). In addition to 2B-IncRex, we introduce a parabolic dynamic law to control the behavior of the robust loss parameters during training. These robust losses are ParamSmoothL1, Huber, and Tukey. As a third contribution, we propose an ensemble regression based on five regressors, namely Resnext-50, Inception-v3 and three regressors based on our proposed 2B-IncRex architecture. These models are trained with the following dynamic loss functions: Dynamic ParamSmoothL1, Dynamic Tukey, Dynamic ParamSmoothL1, Dynamic Huber, and Dynamic Tukey, respectively. To evaluate the performance of our approach, we used two datasets: SCUT-FBP5500 and KDEF-PT. The dataset SCUT-FBP5500 contains two evaluation scenarios provided by the database developers: 60-40% split and five- fold cross-validation. Our approach outperforms state-of-the-art methods on several metrics in both evaluation scenarios of SCUT-FBP5500. Moreover, experiments on the KDEF-PT dataset demonstrate the efficiency of our approach for estimating facial beauty using transfer learning, despite the presence of facial expressions and limited data. These comparisons highlight the effectiveness of the proposed solutions for FBP. They also show that the proposed Dynamic robust losses lead to more flexible and accurate estimators.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature

    When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking

    Get PDF
    The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific literature as gaze tracking. This has become a very hot topic in the field of computer vision during the last decades, with a surprising and continuously growing number of application fields. A very long journey has been made from the first pioneering works, and this continuous search for more accurate solutions process has been further boosted in the last decade when deep neural networks have revolutionized the whole machine learning area, and gaze tracking as well. In this arena, it is being increasingly useful to find guidance through survey/review articles collecting most relevant works and putting clear pros and cons of existing techniques, also by introducing a precise taxonomy. This kind of manuscripts allows researchers and technicians to choose the better way to move towards their application or scientific goals. In the literature, there exist holistic and specifically technological survey documents (even if not updated), but, unfortunately, there is not an overview discussing how the great advancements in computer vision have impacted gaze tracking. Thus, this work represents an attempt to fill this gap, also introducing a wider point of view that brings to a new taxonomy (extending the consolidated ones) by considering gaze tracking as a more exhaustive task that aims at estimating gaze target from different perspectives: from the eye of the beholder (first-person view), from an external camera framing the beholder’s, from a third-person view looking at the scene where the beholder is placed in, and from an external view independent from the beholder

    Microplastic Identification via Holographic Imaging and Machine Learning

    Get PDF
    Microplastics (MPs) are a major environmental concern due to their possible impact on water pollution, wildlife, and the food chain. Reliable, rapid, and high‐throughput screening of MPs from other components of a water sample after sieving and/or digestion is still a highly desirable goal to avoid cumbersome visual analysis by expert users under the optical microscope. Here, a new approach is presented that combines 3D coherent imaging with machine learning (ML) to achieve accurate and automatic detection of MPs in filtered water samples in a wide range at microscale. The water pretreatment process eliminates sediments and aggregates that fall out of the analyzed range. However, it is still necessary to clearly distinguish MPs from marine microalgae. Here, it is shown that, by defining a novel set of distinctive "holographic features," it is possible to accurately identify MPs within the defined analysis range. The process is specifically tailored for characterizing the MPs' "holographic signatures," thus boosting the classification performance and reaching accuracy higher than 99% in classifying thousands of items. The ML approach in conjunction with holographic coherent imaging is able to identify MPs independently from their morphology, size, and different types of plastic materials

    automatic joint attention detection during interaction with a humanoid robot

    Get PDF
    Joint attention is an early-developing social-communicative skill in which two people (usually a young child and an adult) share attention with regards to an interesting object or event, by means of gestures and gaze, and its presence is a key element in evaluating the therapy in the case of autism spectrum disorders. In this work, a novel automatic system able to detect joint attention by using completely non-intrusive depth camera installed on the room ceiling is presented. In particular, in a scenario where a humanoid-robot, a therapist (or a parent) and a child are interacting, the system can detect the social interaction between them. Specifically, a depth camera mounted on the top of a room is employed to detect, first of all, the arising event to be monitored (performed by an humanoid robot) and, subsequently, to detect the eventual joint attention mechanism analyzing the orientation of the head. The system operates in real-time, providing to the therapist a completely non-intrusive instrument to help him to evaluate the quality and the precise modalities of this predominant feature during the therapy session
    corecore